Lexicon Acquisition with and for Symbolic NLP-Systems – a Bootstrapping Approach

نویسندگان

  • Jonas Kuhn
  • Judith Eckle-Kohler
  • Christian Rohrer
چکیده

We present a method of applying a broad-coverage LFG grammar of German in the process of semi-automatic lexicon acquisition from corpora. The identification of corpus instances that illustrate a certain subcategorization frame uniquely is done by a comparison of the numbers of analyses the grammar assigns to the corpus instances, under the assumption of different hypothetical lexicon entries for the candidate verb. Filtering conditions expressed on the feature representation output by the grammar further restrict the sentences that the automatic extraction step is based on. Experiments show that the grammar-based method produces better results than a method based on patterns in a corpus query language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Category Learning as Iterative Prototype-Driven Clustering

We lay out a model for minimally supervised syntactic category acquisition which combines psychologically plausible concepts from standard NLP part-of-speech tagging applications with simple cognitively motivated distributional statistics. The model assumes a small set of seed words (Haghighi and Klein, 2006), an approach with motivation in (Pinker, 1984)’s semantic bootstrapping hypothesis, an...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Multilingual Computational Semantic Lexicons in Action: The WYSINNWYG Approach to NLP

Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fai...

متن کامل

03. Tools and Procedures for the Acquisition of Morphological and Syntactic Information from Corpora

Over the past decades, the importance of the lexicon has increased in both natural language processing (NLP) and linguistic theory. Within NLP, much of the early research focused on isolated ‘toy’ tasks, treating the lexicon as a peripheral component. These days, the focus is on constructing systems suitable for the treatment of large, naturally occurring texts, and therefore rich lexical resou...

متن کامل

Generating Multiwords from MEDLINE in the SPECIALIST Lexicon

Multiwords are vital to better NLP systems for more effective and efficient parsers, refining information retrieval searches, enhancing precision and recall in NLP applications, etc. The Lexical Systems Group (LSG) enhanced the coverage of multiwords in the Lexicon to provide a more comprehensive resource. This paper describes a new systematic approach to lexical multiword acquisition from MEDL...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998